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We consider the statistical deconvolution problem where one ob- 
serves n replications from the model Y = X + e, where X is the un- 
observed random signal of interest and e is an independent random 
error with distribution ip. Under weak assumptions on the decay of 
the Fourier transform of ip, we derive upper bounds for the finite- 
sample sup-norm risk of wavelet deconvolution density estimators fn 
for the density f of X, where / :R — > R is assumed to be bounded. 
We then derive lower bounds for the minimax sup-norm risk over 
Besov balls in this estimation problem and show that wavelet de- 
convolution density estimators attain these bounds. We further show 
that linear estimators adapt to the unknown smoothness of / if the 
Fourier transform of ip decays exponentially and that a correspond- 
ing result holds true for the hard thresholding wavelet estimator if ip 
decays polynomially. We also analyze the case where / is a "super- 
smooth" / analytic density. We finally show how our results and recent 
techniques from Rademacher processes can be applied to construct 
global confidence bands for the density /. 

1. Introduction. Consider the statistical deconvolution model 
(1.1) Y = X + €, 

where X is a real-valued random variable with unknown probability density 
/ : M — )■ M"*" and e is an error term independent of X that is distributed 
according to the probability measure (/? on M. The law P of Y equals the 
convolution f*ip and we denote its density by g. Let Yi, . . . ,y„ be i.i.d. repli- 
cations of Y in the model (1.1) and denote by P„ the associated empirical 
measure. The deconvolution problem is about recovering the unknown den- 
sity / from the noisy observations (Yi, . . . , Yn). It has been extensively stud- 
ied: we refer to Carroll and Hall [9], Stefanski [37], Stefanski and Carroll 
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[38], Fan [14, 15], Diggle and Hall [12], Goldenshluger [22], Pensky and Vi- 
dakovic [36], Delaigle and Gijbels [11], Hesse and Meister [24], Johnstone 
et al. [25], Johnstone and Raimondo [26], Bissantz et al. [3], Bissantz and 
Holzmann [4], Meister [30], Butucea and Tsybakov [7, 8] and Pensky and 
Sapatinas [35], also to the monograph Meister [31], as well as to Cavalier 
[10] for a survey of the literature on general inverse problems in statistics, 
of which deconvolution is a special case. 

One key lesson from the aforementioned literature is that a lower bound 
on the regularity of the signal e is necessary to be able to estimate / with 
reasonable accuracy. This lower bound is often quantified by a lower bound 
on the decay of the Fourier transform F[ip\ of ip and Fourier inversion tech- 
niques are applied to construct estimators for /. 

Most of the literature on this problem (with some notable exceptions, 
to be discussed below) deals with the L^-theory, that is, involves the loss 
function /) = /(/ — /)^ and is often restricted to the case of periodic 
and hence compactly supported /. These restrictions are theoretically con- 
venient, in particular since Fourier analysis-based methods can be used with- 
out too much difficulty, using the Parseval-Plancherel isometry. However, a 
sound understanding of the local behavior of deconvolution estimators seems 
to be of significant statistical importance. In particular a theory that could 
deal with sup-norm loss d{f , f) = sup^-gig — could be used in the 
construction of confidence bands for the object / of statistical interest. A 
fortiori it is not at all clear whether the intuitions from L^-theory carry over 
to pointwise and uniform loss functions in generality, bearing in mind that 
L^-convergence properties of Fourier series can give a completely inadequate 
picture of their pointwise or uniform behaviour. 

In the present article, we use methods from empirical process theory to de- 
rive finite-sample sup-norm risk bounds for deconvolution density estimators 
based on Fourier inversion with Meyer (or similar band-limited) wavelets. 
These estimators were studied in Pensky and Vidakovic [36] and Johnstone 
et al. [25], and have since been successfully used in inverse problems. Our re- 
sults hold under minimal assumptions on the density / and the distribution 

we require / to be bounded, which is unavoidable if one considers sup- 
norm loss, and we assume that the Fourier transform of (p is nonzero on the 
intervals of support of the Meyer wavelet, which is necessary to define any 
estimator based on Fourier inversion and which also makes / identifiable. 
Our risk bounds imply rates of convergence for the deconvolution density 
estimator that are optimal in global sup-norm loss, without any moment or 
support restrictions whatsoever, both in the severely ill-posed case (where 
linear methods suffice), as well as in the moderately ill-posed case (where we 
propose a suitable thresholding method). To be more precise, given the law 
if of the error term and a density / belonging to some Besov body B{s,L) 
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with unknown s > 0, we devise purely data-driven estimators fn such that, 
for every n € N, 



where rn{s,(p,L) is the minimax rate of convergence in sup-norm loss over 
the given Besov body and given the error law (p. We also obtain a result of 
this kind for the case where / is "supersmooth," that is, has an exponentially 
decaying Fourier transform. To the best of our knowledge, the minimax lower 
bounds derived in this article are also new. 

We should note that the main delicate mathematical point in this work is 
to link the L^-based procedure of Fourier inversion to a pointwise, or even 
uniform, control of the random fluctuations of the centered linear density 
estimator; this problem is already implicit in the conditions on F[ip] and / 
imposed by Stefanski and Caroll [38], Fan [14] and Goldenshluger [22], who 
considered pointwise loss. Even stronger assumptions were imposed in the 
nice paper Bissantz et al. [3], wherein the limiting (extremal-type) distribu- 
tion of the uniform deviations over compact sets of certain kernel deconvo- 
lution density estimators for / is derived — this is the only result that we are 
aware of in the literature on deconvolution estimation that deals with sup- 
norm loss in the moderately ill-posed case (Stefanski [37] deals only with the 
simpler severely ill-posed case). Our empirical process approach gives results 
under minimal conditions and also yields the relevant concentration inequal- 
ities that allow for a satisfactory treatment of adaptation, which the results 
in Bissantz et al. [3] do not address. We should note that applying empirical 
process tools in this setting is not at all straightforward: the usual approach 
would be to show that certain kernels are of bounded variation and thus the 
associated sets of translates and dilates are of Vapnik-Chervonenkis type 
(e.g., Nolan and Pollard [34], Einmahl and Mason [13], Gine and Guillou 
[16]), but this does not seem viable in the deconvolution problem, due to 
the fact that the bounded variation norm does not possess a nice Fourier- 
analytical characterization. We can, however, solve this problem by combin- 
ing recent results on VC properties of functions of quadratic variation in 
Gine and Nickl [19] with Littlewood-Paley theory and the fact that wavelet 
bases are compatible with both the L^- and L°°-structure simultaneously; 
see Lemma 1 for this key result. 

Our results can be used to construct confidence bands in the deconvolution 
problem and we discuss this in some detail below, as well as relations to work 
in [3, 4]. We suggest a new approach to nonparametric confidence bands 
based on Rademacher symmetrization, in a similar vein as in recent work of 
Koltchinskii [29]. While these confidence bands may be conservative, they 
allow for an explicit finite-sample analysis under minimal assumptions. 



sup 
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Let us finally remark that this article also contains new results for the 
standard density estimation problem (where 99 equals Dirac measure 60 at 
0). In this field, our results contribute in several respects: first, Vapnik- 
Chervonenkis properties of wavelet projection kernels have thus far only been 
derived for Daubechies wavelets [19] and Battle-Lemarie wavelets [21], and 
the present article achieves the same for wavelets with compactly supported 
Fourier transform (e.g., Meyer wavelets). Furthermore, our main adaptation 
result, Theorem 4, is completely free of any moment conditions and thus 
shows, as may have been suspected, that the moment conditions imposed in 
Theorem 8 in [19] are not necessary. Finally, the confidence bands we suggest 
can also be used for regular wavelet density estimators and we are not aware 
of any other results on global confidence bands in density estimation, except 
for the rather technical ones in [17]. 

2. Main results. We start with some preliminary definitions and facts. 
For any Lebesgue integrable function h G L^(M), the Fourier transform F[h] 
of h is defined as F[h]{t) = f^h{x)e~^*^ dx, t G M, and we use the natural 
extension of F to L^(]R). We further denote by F~^ the inverse Fourier 
transform so that F~^Ff = f for / G L^(]R). The Fourier transform of the 
density g from (1.1) is then given by 

(2.1) F[g]{t) = F[fmFMt) 

for every f G ffi. Another standard property of the Fourier transform we shall 
frequently use is its scaling property: for h G L-^(M) and q G M \ {0}, the 
function ha{x) := h{ax) has Fourier transform F[ha\{t) = F[h]{a~^t) . 

Let (f) and ^p be, respectively, a scaling function and the associated wavelet 
function of a multiresolution analysis. We refer to [23, 32] for the basic theory 
of wavelets that we shall use freely in this article. The dilated and translated 
scaling and wavelet functions at resolution level j and scale position are 
defined as (pjkix) = 2^/20(2^2; - k), ipjkix) = 2^l'^i){Tx - k), j,k£ Z. Now, 
denote by (•, •) the inner product in the Hilbert space L^(R). The density / 
can be formally expanded into its wavelet series 

00 

/ = '^oijk{f)4>jk + "^"^PikiDi^ik, 

kez i=j fcez 

where the coefficients are given by Ujkif) = {f,(pjk), f^ikif) = {f,ipik), 
l,j,k G Z. As is well known, the regularity properties of a function / can 
be measured by the decay of their wavelet coefficients. We define Besov 
spaces as follows. 

Definition 1. Let 1 < p, (7 < 00, s > or let s = and q = l. Let (j) and 
tp be the Meyer scaling function and mother wavelet, respectively (see, e.g.. 
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Section 2 of [36] for a definition) . The Besov space Bp^ (M) is defined as the 
set of functions 

|/ G L^(IR) : \\f\U, = \\ao.\\, + < ooj, 

where || • \\p are the norms of the sequence spaces £^(Z), and with the usual 
modification in the case q = oo. Moreover, for any L > 0, the Besov baU of 
radius L is defined as B{s,p,q, L) = {/ G LP{U) : \\f\\s,p,q < L}. 

2.1. Minimax lower bounds over Besov bodies. Before we construct ex- 
phcit estimators for the density / of X in the deconvolution model (1.1), 
we derive a result that gives a benchmark for the best performance of any 
estimator /„. More precisely, we derive lower bounds for the minimax rate of 
convergence of — / in sup- norm loss, uniformly over Besov bodies of den- 
sities / under various assumptions on the error law tp. We will subsequently 
show that these lower bounds can be attained by certain wavelet-based es- 
timators and are thus optimal. 

To this end, define the minimax L°°-risk over the Holder class B{s, L) := 
i?(s, oo,oo,L)n{/:M—> [0, oo), /j^ /(x) = 1} as 

(2.2) Rn{B{s,L)) = \ni sup £;sup |/„(x) - /(x)|, 

/n f£B{s,L) xm 

where the infimum is taken over all possible estimators fn- Note that an 
estimator in the deconvolution problem means any measurable function of 
a sample Yi,...,Yn from density f * (p that takes values in the space of 
bounded functions on R. 

We shall make the following assumption on F[ip] to establish the lower 
bounds. 

Condition 1. There exist constants C, C" > 0, w,uj' £R and ti,co > 
such that -F[(^](t) is differentiable for every t satisfying \t\ > ti and 

\F[ip]{t)\<C{l + tY'"/^e-'°^'^\ 

as well as 

|(i^M)'(t)| < C7'(l +t2)-"''/2g-co|i|"_ 

This condition is weaker than the standard ones employed in deconvolu- 
tion problems to establish lower bounds (cf. [8, 14]), where an additional 
condition is imposed on the second derivative of F[(f]. It covers the usual 
candidates for if, including the case (p = 6o which corresponds to classical 
density estimation (w = cq = 0). 
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The following theorem distinguishes the "moderately ill-posed" case, where 
F[(p] decays only polynomially, from the "severely ill-posed" case, where F[(p] 
decays exponentially fast, and shows that the optimal rates of estimation in 
the sup-norm depend both on the smoothness of / and the decay of F[(p]. 

Theorem 1. Let Condition 1 be satisfied. Then, for any s,L>0, there 
exists a constant c := c{s, L, C, C , a, w, w' , cq) > such that for every n > 2, 
we have 

1 \'/" 

:j 1 , Co > 0, 

R4Bis,L))>c{ ^^°^ny ^^^^^^^^^^ 

I , i/ Co = and vu' >w>0. 

n J 

One may be interested in replacing the Holder class B(s,L) by a more 
general Besov body, B{r,p,q, L), r > 1/p, of densities. It follows from the 
proof of Theorem 1 that the minimax rate over B(r,p,q, L) equals the one 
for B{s,L) with s = r-l/p and the Sobolev embedding -Bpg(M) C -B^oo(I^) 
will imply that our upper risk bounds derived in the following sections attain 
this rate. We thus restrict ourselves to B{s,L) without loss of generality. 

2.2. Uniform fluctuations of wavelet deconvolution estimators. 

2.2.1. The linear wavelet deconvolution estimator. Recall the model (1.1). 
We now show, following [36], how one can estimate / from a sample of P 
by "deconvolving" P or, rather, a suitable approximation of it, on a wavelet 
basis that satisfies the following condition. 

Condition 2. Assume 4>,ip & L^(]R) for every 1 < p < oo, and for some 
< a' < a, we have supp(F[i;^]) C [—a, a], as well as supp(F['0]) C [—a, —a] \ 
[—a',a']. Assume, further, that 

(2.3) c((^) := sup |(/)(x — A;)| < oo, c(V') := sup^ ^ |'0(a: — fc)| < oo. 

This condition is satisfied for Meyer wavelets with a = Svr/S and a' = 27r/3 
(these choices are not optimal, but feasible) — see, for instance. Section 2 in 
[36] — but other band-limited wavelet bases are also admissible. 

If K{y, x) := X^fcgg 4>{y~f^)4'i^~k), then the functions Kj{y, x) := 2^ K{2^y, 
2^x), j G N, are the kernels of the orthogonal projections of L^(M) onto the 
closed subspaces Vj C L^(]R) spanned by {(pjk : € Z}. We write, for x G M, 
J > possibly real-valued, 

Kj{f){x) = Y,'^^4>{2'x-k) I 4>{2^y-k)f{y)dy= f K,{x,y)f{y)dy, 
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where the second equaHty holds pointwise, in view of (2.3). 

Suppose the Fourier transform of the error law ip satisfies l-Fiv?]! > on 
supp(i<'[(/)](2~-^ (•))). We then have, from Plancherel's theorem, that 



= y2qy{2^x-k)^ [ F[cl>ok]{2-H)F[f]{t)dt 
= Y^^2^x-k)^ [ F[<Pok]{2-H)F[g]{t){F[^]it)r' dt 



(2.4) 



= 2^ V^(2^x-A:) / 4>,k{y)g{y)dy= [ K*{x,y)g{y)dy, 

where the (nonsymmetric) kernel Kj is given by 

i^';(x,2/) = 2^'^</.(2^x-A;)<A,fc(y) 

with 



feez 



(2.5) ^jkix) = F 



-1 



2^F[^] 



{x) = M2^-)*F- 



-23a,23 a] 



(x). 



We should note that Young's inequality for convolutions implies, for fixed 
j, that ||(/>jfc||cxD < oo, and then also ||Er*||oo < oo, which justifies the above 
operations. 

Since we have a sample Yi, . . . ,1^ from the density g, the identity (2.4) 
suggests a natural estimator of /, namely the wavelet deconvolution density 
estimator 



(2.6) 



1 " 

fn{x,j) = -y2KUx,Ym), x€R,j>0. 



2.2.2. Uniform moment and exponential hounds for the fluctuations of 
fn — Efn . We start with some results for the uniform deviations 



(2.7) sup \fn{x,j) - Efn{x,j)\ < c{<P)2^ sup 
xeR fcez 



1 " 



m=l 



where the inequality follows from (2.3). This suggests to study the empirical 
process indexed by the class of functions = {(pj^'.k G Z}. In fact, some 
further scaling depending on the error distribution (/? will be useful to obtain 
a class with constant envelope. 
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The rather intricate Fourier-analytical definition of (pjk in (2.5) makes it 
difficult to apply standard results from empirical process theory. What is 
needed is that be a Vapnik-Chervonenkis (VC-type) class of functions. 
In the classical density estimation case (where F[(p] = 1), this follows from 
results in Nolan and Pollard [34] for translates of a fixed function of bounded 
variation. We could, however, not control the bounded variation norm of 
for general (p in a way that would be useful, mainly because the bounded 
variation norm does not interact well with Fourier transforms. Recent results 
by Gine and Nickl [19] show that the bounded variation condition in Nolan 
and Pollard [34] can be replaced by p- variation for general 1 <p<oo, and 
the case p = 2, which corresponds to "quadratic variation," can be linked in 
a more efficient way to Fourier analysis by using Littlewood-Paley theory. 

The following key lemma shows that J-, suitably normalized, is indeed a 
VC-type class of functions, under minimal conditions on F[(p]. Denote by 
N{e, T, LP'{Q)) the e-covering numbers of a class of functions F with respect 
to the L^(Q)-distance. 



Lemma 1. Suppose that (pjip satisfy Condition 2 and that \F[{p]{t)\ > 
on [— 2-'o, 2''a] . Define 

(2.8) 5j:= min . |F[<^](t)| 

te[-2^a,23a] 

(which exists and is positive for every j since (p is a probability measure). 
Then the class Tij = {5j(f)jk '■ k S Z}, j > 0, is uniformly bounded by the con- 
stant U and satisfies, for every < e < A, supq N{e,'Hj, L'^{Q)) < {A/e)"" 
for finite positive constants A,v,U depending only on (p^ip, and where the 
supremum extends over all probability measures Q on R. 



Combining this lemma with moment bounds for empirical processes in- 
dexed by VC-type classes of functions in [13, 16], as well as with Talagrand's 
[39] inequality, we obtain the following result. 



Proposition 1. Suppose that c/),^ satisfy Condition 2, that \F[ip\{t)\ > 
on [—2^ a, 2^ a], let 6j be as in (2.8) and define j' = max(l,j). Let fn{x,j) 
be the deconvolution wavelet density estimator from (2.6) and assume that 
X has a bounded density /iM— ?• [0,oo). Then there exists a constant L' , 
depending only on (j),'4>,p, such that for every n>l, every j >0 and l<p< 
oo, 

(E(snp\fn{xJ) - Ef^{xJ)\)y^'' < L'UgJ?1 +'^) , 
V Vj-gK / / dj\ \ n n J 
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1/2 

where G = max(||(7||oo ,!)■ In addition, there exists a constant C, depending 
only on (pjip, such that for every j >0 and u> 0, 



Prjsup j) - Efnix,j)\ > ^ (gJ (1 + n)^ + (1 + n)^ 

(2.9) 

<e-{i+«)i'. 



The constant C is unspecified here, although it could be computed ex- 
plicitly. Obtaining realistic constants is an intricate matter, but one can use 
symmetrization techniques to circumvent this problem; see Proposition 3 
below. 



2.2.3. Uniform fluctuations of the empirical wavelet coefficients. The 
techniques from the previous section allow us to establish similar uniform es- 
timates for the deviations of the empirical wavelet deconvolution coefficients 
fiik from their means. Such results are particularly interesting for nonlinear 
thresholding procedures that we shall study below. 

We have, for ijj satisfying Condition 2, 

Afc(/) = 2'/2 / ^{2^x-k)f{x)dx = t^ j 2-'^^M^(t)Fb](i)dt 
Jr 27r Jjj F[ip\ 

JR 

A natural unbiased estimator of Pik = Pikif) is therefore 
(2.10) Afe(/) = Y^tpikiYm) 

m=l 

and the object of interest in this subsection is the random variable sup^g^ 1/3;^ — 

We should note that for wavelets satisfying Condition 2 (e.g., Meyer 
wavelets), and even if g has compact support, the last supremum is over an 
infinite set, so empirical process techniques are particularly useful. Lemma 
1 and Proposition 1 have the following analogs for tp. 



-/ i^[^ofc](2- 



{x)g{x)dx =:2'-/'^ / '4)ik{x)g{x) dx 



Lemma 2. Suppose that (/>,^ satisfy Condition 2, that \F[ip\{t)\ > on 
[— 2'a, 2'a] and let 5i he as in (2.8). Then the class Vi = {diipik ■ k G Z}, I > 0, 
is uniformly bounded by a fixed constant U and satisfies, for every < e < A, 
supQ N{e,T)i, L'^{Q)) < {A/eY for constants U,A,v depending only on (j),'>p. 

Proposition 2. Suppose that (f),ilj satisfy Condition 2, that \F[ip]{t)\ > 
on [— 2'a,2'a], let di be as in (2.8) and define Z' = max(Z,l). Assume that 
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X has a bounded density /:M— t- [0,oo). Then, for every n>l, for every 
I > and 1 <p < oo, we have 

\ k&z ' di\ y n n J 

where L" > depends only on p,(t),ip and where G is as in Proposition 1. 
In addition, there exists a constant D, depending only on (p, ip, such that for 
every I > and u> 0, 

(2.11) Prjsup lAfc - Afcl > ? (gJ{1 + u)- + (1 + u)^) ] < e-(i+")r 

2.3. Optimal estimation over Holder classes. We now show how the risk 
bounds from the previous section imply optimal rates of convergence for 
densities / € B^^(R) in the deconvolution problem, under the standard 
decay conditions on F[ip] from the inverse problem literature. 

We first consider the case where the error law (p decays exponentially 
fast. In this "severely ill-posed" case, one can find a universal choice of j 
for which the linear estimator attains the exact minimax rate, even without 
having to know the value s. 

Theorem 2. Suppose that (j),ip satisfy Condition 2 and assume that 
\FVp\{^)\ — C'e~'^ol*l" for every t G M and some C,co,a > 0. Let fn{-,jn) be 
the estimator defined in {2.6), where jn = ^ log2(i'logn) for some v sat- 
isfying coa'^u < 1/2. Then there exists a constant L'" , depending only on 
s, L, (p, ip, Co, C, a, u, such that for every n>2, we have 

/ 1 N 

sup Esup\fn{x,jn) - f{x)\<L'' 



— — — 1 I J It' \ 1 J itj / J \ — / I — — 1 1 

feB{s,L) xeR \logn 

We now turn to the case where F[ip] decays polynomially, the so-called 
"moderately ill-posed" case. Here, the linear estimator /„ is only minimax 
optimal if one knows the value of s. 

Theorem 3. Suppose that (f),il) satisfy Condition 2 and assume that 
\F[ip]{t)\ > C(l + |tp)-"'/2 for every t G M and some C > 0, w > 0. Let 
fni'ijn) be the estimator defined in (2.6) with j = jn satisfying 2-'" ~ 
(n/logn)^/(^**'''^'"+^\ Then there exists a constant C , depending only on 
s, L, (j), ijj, C, w, such that for every n>2, we have 

Ylogn\*/(^'+2"'+^) 

sup Es\ip\fn{x,jn)- f{x)\<C'[ 

f£B{s,L) xm \ ^ / 
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The question arises as to whether we can achieve this rate of convergence 
without having to know the value of s in our choice of j„ so that we can 
adapt to the unknown smoothness s of /. This can be done using the wavelet 
thresholding deconvolution estimator proposed in Johnstone et al. [25] in the 
periodic setting, defined as follows: for ji positive integers, to be specified 
below, the hard thresholding estimator equals 

ii-i 

(2.12) /J(x) = /„(x,0) + ^ 

1=0 k 

where (3ik was introduced in Section 2.2. The threshold r is chosen such that 
T = T{n,l,w, k) = K2"'^y^ (log n)/n, where k = Gk', with G from Proposition 
1 and k' a "large enough" constant that depends only on w,C,(j),ip. If G is 
unknown, then it can be replaced by an estimate, as in [21]. 

Theorem 4. Suppose that (pjip satisfy Condition 2. Suppose that (p is 
such that \F[ip]{t)\ > C(l + It^-Wa jor every teR and some C>0, w>0. 
Let be the thresholded estimator in (2.12) with 

/ ^ X 1/(2^+1) / ^ xl/{2-+l) 

1 <2^^<2 , ji>0. 

\ log n J \ log n J 

We then have, for every n>2 and every s > 0, that 



(2.13) sup Esnp\fUx)-f{x)\<D 

f€B{s,L) xeR 




s/(2w+2s+l) 



where D>0 depends only on s,L,<j),ip,w,C . 
2.4. Extensions and applications. 

2.4.1. Estimation of a supersmooth density. In the last sections, we es- 
tablished the minimax rate of estimation of a density in B^^(M) for the 
sup-norm error, both in the moderately and severely ill-posed cases, and 
constructed estimators that attain this rate. It was pointed out in [36] for 
the L^-error that the linear and thresholded estimators attain faster rates 
of convergence if we consider classes of supersmooth densities instead of 
the usual Besov spaces. In this section, we investigate this phenomenon for 
the sup-norm error. We show that the minimax rate of convergence for the 
sup-norm is the same as that obtained for the L^-error up to an additional 
\/log log n factor and that wavelet estimators can attain this rate. For sim- 
plicity, and to highlight the main ideas, we only consider the nonadaptive 
case. 
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Assume that / belongs to the class of supersmooth densities, 



[0,oo), [ f = l, [ |F[/](t)|2exp(2co|tn(it<2^L|, 

JR JR ) 



where co,s,L > 0. In the moderately ill-posed case, we have the following 
result. 

Corollary 1. Let (j),ip satisfy Condition 2. Assume that /G^co,s(-^) 
for some co,s,L>0 and that \F[(p]{t)\ > C(l + Itp)""^/^ for every t€M. and 
some C > 0, w >0. Let fni',jn) be the estimator defined in (2.6) with j = jn 
satisfying 



l/s 



Then there exists a constant C', depending only on (j),ip,co,s,L,C,w, such 
that for every n > 3, we have 



( log log n 



sup £;sup|/„(x,j„) - f{x)\ < C (logn 



n 



1/2 



^{w+l/2)/s 



The rates we obtained for the sup-norm error are similar to those obtained 
by [7, 8] and [36] for the L^-error, up to the presence of the additional 
factor \/log logn. This additional factor can be heuristically explained by 
the presence of the quantity -y/j in the deviation term 5~^{2^ j /nY^'^ derived 
in Proposition 1. The next theorem implies that this \J\og log n factor is 
indeed necessary. 



Theorem 5. Fix < s < 1 and co,L > 0. Assume that ip satisfies Con- 
dition 1 with Co = and w' >w >0. There then exists a positive constant 
c := c{s,co, L,C,C' jWjw') such that 

inf sup Esup|/„(x) >c(^^^ ) (logn)("'+i/2)/s_ 

/n /e.Ago,s{L) a;eK V / 

We can also obtain a faster rate of convergence in the severely ill-posed 
case for supersmooth densities, balancing the bias bound from Proposition 
4 below with the variance bound from Proposition 1 above. We can then 
obtain similar results as in [7, 8], with additional logarithmic terms in the 
rate of convergence, due to the fact that we consider sup-norm loss instead 
of L^-loss. 
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2.4.2. Confidence bands. One of the main statistical challenges in the 
nonparametric deconvolution problem is the construction of confidence bands 
for / (cf. [3, 4]). In [3], the exact uniform (over compact subsets of ffi) limit 
distribution of certain linear kernel-based deconvolution estimators for / 
is derived, assuming that / satisfies /j^ |F[/](n)| < oo for r > and 

that g is once differentiable with bounded derivative, and if the Fourier 
transform of the error variable decays exactly like a polynomial, that is, 
|F[(/?](t)| ^ C\t\~^ for some C > 0, w>0. If the underlying smoothness r of 
/ is known, then these results can be used to construct asymptotic confi- 
dence bands for / that shrink at certain rates of convergence. 

We suggest here an alternative approach to confidence bands in the non- 
parametric deconvolution problem. Instead of extreme value theory, we use 
concentration inequalities and Rademacher processes. This allows for almost 
assumption-free results and has the advantage that the confidence band can 
be shown to be valid on the whole real line and for every sample size n. On 
the downside, these bands are likely to be too conservative in the limit. 

One fundamental problem of using concentration inequalities (as in Propo- 
sition 1) in practice is that often, no reasonable values for the leading con- 
stant C are available. To circumvent this problem, we use here an idea that 
goes back to Koltchinskii [28, 29] and Bartlett, Boucheron and Lugosi [1]; 
see also Gine and Nickl [21], where this approach was introduced in density 
estimation. Define a Rademacher process and the associated supremum, 

f 1 " 1 1 " 

< - £mK*{x, Ym) > , Rn{j) := SUp - EmKUx, Y„ 

y m=l ) xeR ^ m=l 

with (em)m=i i-i-d- Rademacher sequence, independent of the Ym's (and 
defined on a large product probability space). Rn can be computed in prac- 
tice by first simulating n i.i.d. random signs, applying these signs to the 
summands K*-{x.,Ym) of the wavelet deconvolution density estimator (2.6) 
and maximizing the resulting function. Similarly, one can consider £^ei2„(j), 
the expectation of Rn{j) with respect to the Rademacher variables only, 
which is a stochastically more stable quantity. 

We shall use the fact that this is the supremum of a centered process 
which can be shown to concentrate around 2£'||/„(-, j) — £^/„(-, j)||oo- To de- 
scribe the concentration property, recall 6j from (2.8) and define the random 
variable 



(2.14) .«(„.i,.) =6/;„(i) + RiJ?!hMi±]^^£ini±}^^ 



5j V n 5j n 



where Di = 10c((/))||(/>||i < 5.7c(0)||(/)||i ^a, ^2 = Uc{4>) y^a/2Tr^ < 

llc{<j))y/a and c{(j)) as in (2.3). If \\g\\oo is unknown, it can be replaced 
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by ||/n( ■;Jn)||oo ill practice so that cr is completely data-driven. We start 
with a confidence band Cn = [fni-J] - cr^in,j,z),fni-,j) +cr^in,j,z)] for 
the mean Efn of /„. 

Proposition 3. Let fn{x,j) he the estimator from (2.6) and suppose 
that \F[ip\ \ > on [—2^ a, 2^a\. Assume that X has a bounded density / : M — t- 
[0,oo). We then have, for every n > 1, every j G N and every z > 0, that 



for every z > 0, every n G N, every j > 1 and some constant C depending 
only on \\g\\oo,4>,ip, z. 

Proposition 3 still holds true when Rn{j) is replaced by Ei.Rn{j), the 
expectation of Rn{j) with respect to the Rademacher variables only. (This 
follows from combining the proof of Proposition 3 with the arguments in the 
proof of Proposition 2 in [21].) 

We did not try to optimize the constants in the choice of and they 
are likely to be suboptimal, as they depend on the constants in the lower- 
deviation version of Talagrand's inequality, where sharp constants are not 
yet known. A "practical" choice may be to replace the 6 in front of Rn by 4 
and to ignore the third "Poissonian" term in (2.14). 

We again emphasize that we simply need |F[(^](t)| to be bounded from be- 
low on the fixed interval [—2^ a, 2^a\ for our results to hold and we do not need 
any support or moment assumptions on /. In particular, this nonasymptotic 
result can even be used in principle when F[ip\ equals zero eventually, by 
choosing j small enough. 

If / G B^^(R), with s known, then the last proposition can be readily 
applied for the construction of confidence bands Cn for the unknown density 
/ using undersmoothing (just as in [3]) and these bands can be shown to 
shrink at the optimal rate of convergence depending on the smoothness of /. 
We do not detail this here, nor do we address the more difficult problem of 
adaptive confidence bands: using Proposition 3, such results can be obtained 
in the same way as in the case of density estimation considered in [20]. 

Instead, and for sake of illustration, let us construct a nonasymptotic 
confidence band in the supersmooth case / ^ Acq,s{L)., s,co known, with 
moderately ill-posed error distribution. 




Moreover, the hand Cn has expected diameter 
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Corollary 2. Let f, if, fn{--,jn) and jn be as in Corollary 1. Let a^{n,j, z) 
he as in (2.14) above and define the confidence band 

Cn{x,z) = [fn{x,jn) ± (1 + (5)cr^ (n, J„, z)] , X G M, 

where 6 is any positive real number. Then, for every z > and every n G N, 

Pr{/(x) G Cnix, z) Vx G M} > 1 - e""^ - 
where [c'" = c"'{<j),ip,co,s), as in Proposition 4/ 



PiU''{n,jn,z)<—\ L 



6 \ n 

satisfies f „ — t- as n oo . 

Moreover, if \Cn{z)\ is the maximal diameter of Cn{x, z), then 



1/2 

E\Cniz)\<C{ ) (logn) 



log log n\ n^^.M^+l/2)/s 



n 

where C depends on cq, s, L,6, z,\\g\\oo- 

Since limt;„ = 0, this confidence band has asymptotic coverage for 6 > 
arbitrary, but more is true: Vn equals zero from some n onward and one can, 
in principle, even obtain coverage for every fixed sample size n by choosing 
5 in dependence of L (and of the constants that define a^). 

3. Proofs. 

3.1. Proof of Theorem 1. Our proof adapts to the present situation stan- 
dard lower bound techniques as in [8, 14, 35]. We recall that the Kullback- 
Leibler divergence between two distributions P and Q is defined by 

«(P|Q) = (/'°Kf)''^- 

[ +00, elsewhere. 

To establish lower bounds for the minimax risk (2.2), we use the following 
lemma (see Theorem 2.5 on page 99 of [40]) — actually, an adaptation of 
it — to the deconvolution problem at hand. 

Lemma 3. Let d he a metric on B{s,L). Let rn he a sequence of positive 
real numbers and let C C B{s,L) he a finite set of probability densities such 
that card(C) > 2 and y f,g G C, f d{f,g) ^ 4r„ > 0. Further, let ip be 

a fixed probability measure and let PJ^^p he the product probability measure 
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corresponding to a sample of size n from the law f * <f, f & C, and assume 
that the KL divergences satisfy, for every f & C and some /o € C, 

Then, 

M sup Ed{fnJ) > cir„, 
fn fee 

where inf j denotes the infimum over all estimators based on a sample of 
size n from the density f * ip and where ci > is a constant that depends 
only on s,L. 

We use this lemma to prove Theorem 1. Let ip be the Meyer wavelet. 
Fix s,L > and let j € N be arbitrary (to be chosen later). Define the 
set of functions C = {fk, k = 0, . . . ,2^ — 1} as follows: consider the standard 
Cauchy density p{x) = l/vr(l + x^), set fo{x) = ^p(^) for r/ > and for any 

= 1, . . . , 2^' - 1, = /o(x) + c'2-j(*+V2)^^.^^^ ^ ^jjg^g ^ gQj^g 
integer M > 1 specified below. We show that the constants rj,c' > can 
be chosen such that fk is a density on R and, in fact, belongs to B{s,L) 
for every k = 1, . . . ,2-^ — 1 and every integer M. Clearly, fk integrates to 
1 since is orthogonal on constants. We next prove fk S B{s,L) for all 
k and suitable c',r/. First, we have ||/o||s,oo,oo < for > 1 large enough 
and depending only on s,L,%Ij,(I), in view of F[fo]{u) = e"''!"!. Definition 
1, |Afc(/o)| = 1(1/2^) /jge-^l^lFfV^zfeK?.)! <2-'/2||V.||ie-|2''^'l^ with a' = 27r/3 
and a similar estimate for afc(/o). Thus, we have, for < c' < L/2, 

||/fc|U,oo,oo < ||/o||.,oo,oo + l|c'2-^'^^+'/'VifcMlL,oo,oo <^ + C<L. 

Having chosen r/, we can choose c' < L/2 suitably small but positive and 
depending on r] and so that > on M for any k. This is easily established 
by using the fact that the Meyer wavelet decays faster at infinity than any 
polynomial [i.e., the estimate IV'C^^)! ^ Cn/{'^ + for every N EN and 

every x € M], whereas foix) decays at infinity like 

To proceed with the proof, we set 7j = c'2~^^^^^^'^\ We first prove the 
separation property in sup-norm for the fkS. For any distinct fk,fk', we 
have Wfk - /fc'lloo = 'yj2^/^U{- - Mk) - - MA;')||oo. By definition of the 
Meyer wavelet, we have, for any k^k' , 

svLY>\'4){x - Mk) - ip{x - Mk')\ = s\XY>\ip{x) - ip{x + M{k - k'))\ 

X X 

>\moo-H{Xra.. + M{k-k'))\ 
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for some Xmax G arg max^, \ tp{x) \ . By the decay property of the Meyer wavelets 
mentioned above, there exists a numerical constant M > 1, large enough 
but finite, such that for any x satisfying \x\ > M, we have |'i/'(a^max + x)\ < 
3/2. Thus, we have, for any k / k' , 



.c 



7i 

We now check the second condition of Lemma 3. Let (Yi, . . . ,1^,) be an 
i.i.d. sample with distribution PJ} admitting the density nr=i(/fc * ^){yi) 
with respect to the Lebesgue measure on M". Fubini's theorem and the fact 
that ^ is orthogonal on constants give, for A; G Z, that J^iijjjk * ^){y) dy = 0. 
Thus, by definition of the Kullback-Leibler divergence and the inequality 
log(l + x) <x for X > —1, we obtain, for any k = 1, . . . ,2^ — 1, that 



K{Pj:\P,-)=n [ log(^{y)){f,*^){y)dy 



= n [ log(l+7,%^(y)V/fc*(^)(y)a!y 
, , Jr \ Jo*V> J 

(3.1) 

< "7, / ii^jk,, * ^){y) f 1 + 7,%^(y)) dy 

Jr V JQ*'^ J 

-^7,- / Ky)dy. 

Jr Jo*^ 

To proceed, we observe that /o being Cauchy implies that (/o * ^){y) > 
ci/(l + y^) for some ci > and every y G M. This is obviously true for y in 
any compact set [— A,74], and for \y\ > A, it follows from 



liminf(l + y^)/o *(/?(?/) > — / liminf — 



d(p{x) = -, 



in view of Fatou's lemma. Consequently, we have 



^%^l^(y)dy<l / {l + y^){^,,,,*^f{y)dy. 

Let us first consider the quantity J^ii^jkM *v)'^iy)dy- Plancherel's theorem 
gives 



{ij,k,,*^riy)dy = C2 / \F[i;,kJ{mF[^]it)\' dt 

(3.2) 

<C32-J' 11-011? / (l + t2)-"'e-2'=ol*l"dt 
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for some constants C2,C3 > depending only on C, vr. 

For the quantity /jj * v)'^{y)dy, we obtain similarly, using in ad- 

dition the spectral representation of the differential operator, that 

{y'4^jkM*'P?iy)dy 

= C2 [ \{F[^p,kJ{t)FMt))fdt 
(3.3) =C2 [ \iF[iP,kJit)Fm)+F[^jk,MF[ip]'m'dt 

JR 

= C2 / \{2--'^/'F[^PokJ{2-H)F[m + F[^^,J{t)F[^]'m'dt 

JR 

<2c42~3jY/" \x^^x^\dx] [ (l + t2)-"'e-2'=ol*l"di 

\Jr J ysupp(F[V,fe^,]) 

Jsupp{F[ipjkjyj]) 

where C4 depends only on C, C, w' , vr. Combining (3.1)-(3.3) and the explicit 
formula for the support of the Meyer wavelet, we can bound K{P^\Pq) by 



/ /■{87r/3)2J f{8n/3)2^ \ 

2-U / (l + t2)-«'e-2co|tr^^+ / (l+t2)-«,'g-2co|tr^^\ 

\J(2n/3)23 J(27r/3)2i / 



where C5 > depends only on C,C", w',7r, ||V'||i;/r dx. It remains to 

estimate the size of these integrals and select j appropriately and we distin- 
guish the moderately and severely ill-posed cases. 

In the moderately ill-posed case (cq = 0, w' >w >0),we have K{PJ}\Pq) < 
CQ{c')'^n2~^^'^^~^'^'^^^^ for some constant cq > independent of n, j. Taking 
2^ ~ (n/logn)^/^^*"*"^"'"'''^) and c' > small enough (independent of n and 
j) in the definition of 7^ gives K{P^\Pq) < C6(c')^(logn) < ^ log(card(C)), 
where we recall that card(C) = 2^ . The separation rate r„ for this choice of 
jn becomes, for any k, k' distinct, 

/logn\^/(2^+=''"+^) 
ll/fc - /fc'lloo > cri j :=r„ 

for some constant C7 > independent of n. This proves Theorem 1 for the 
moderately ill-posed case. 

For the severely ill-posed case (cq > 0), we similarly obtain that K{P'I^\Pq) < 
C8(c')^n2"^'=('^'"'''"')2-'^o2^" with do = (2co(27r/3)")/log2 and constants cs > 
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0, c{s,w,w') independent of n,j. Taking j„a = log2(3^ log2 n) with z/ > 1 
large enough gives 

K{P^\P^) < C9(c')'(log2n)^'('^'-'-')ni"- < log(card(C)), 

where cg > 0, c'(s,w,a) are nonnegative constants independent of j,n. For 
this choice of jn, the separation rates r„ become, for any k,k' distinct, 

/ 1 \«/" 

ll/fc - /fc'lloo > ClO f 



^ log n ^ 

where cio > is independent of n. This concludes the proof of the theorem. 
3.2. Proofs of VC properties. 
Proof of Lemma 1. Set 



r]j{x)=F ^^l[„2Ja,2Ja]=^)(3;), 



which is bounded and continuous, and rewrite 
4>jkix) =(j)Qi,{2^-) *r]j{x) 

4>{2^x-2^y-k)r^j{y)dy 

2-^/^cPjoix-y-2-^k)iy{y)dy 

= 2-^/^<l)jO*Vj{x-2-^k) 

so that it is sufficient to study the class consisting of translates of the fixed 
function 2~^^'^(pjQ * r]j. First, note that 5j<j)jk, k GZ, is uniformly bounded 
in view of the last estimate and since 

(3.4) {2-^/Hj)Ujo * ??,||oc < {2-^/^6j)Ujo\\2H\\2 < V2^/27r, 

where we have used Young's convolution inequality and Plancherel's theo- 
rem. 

To prove the entropy bound, we will show that <pjo*f]j has finite quadratic 
variation (i.e., 2- variation) . In fact, to obtain a bound on the quadratic 
variation that is independent of j, we renormalize and show that the function 
{2~^ /'^5j)(j)jQ * Tjj has quadratic variation bounded by a constant D that 
depends only on (j). This will complete the proof of the lemma by using 
Lemma 1 in [19], which states that the set of dilates and translates of a fixed 
function h of bounded variation, 1 < p < oo, is of VC-type with constants 
A,v depending only on p and the p- variation norm of h. 
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We will prove that {2~^^'^6j)(f)jQ * rjj has bounded quadratic variation by 
showing that it is contained in the (homogeneous) Besov space i?2i (K), 

■ 1/2 

which is sufficient, in view of the continuous embedding of i?2i (^) i^^o the 
space y^(R) of functions of quadratic variation (a result due to Peetre — see 
Theorem 5 in [5] for a proof, also the proof of Theorem 2 in [33] , which applies 

■ 1/2 

to p = 2 as well). The seminorm || • ||i/2,2,i of -^21 (^) following 
Littlewood-Paley characterization: 



ll/: 



2,2,1 =E2'^'ll^"'[^^^[^]]l|2' 



where 7; is a dyadic partition of unity with 7/ supported in [2'~^,2'+^] (see, 
e.g.. Theorem 6.3.1 and Lemma 6.1.7 in [2]). We bound the Littlewood-Paley 
norm: using the fact that F[2"j/2^jo] = 2"JF[0](2--'-) and Plancherel's the- 
orem, introducing the notation (u) = (1 + |up)^/^ and in view of the support 
of 7/, we have the bound 



5,-^2'/2||F-H7/^[2"^'/'0,o*^i]]|l2 



27r ^ ^ 



i{u) 



1/2 



'^•)lh2.a,2.a](i^M)"^|^ 



<c2"^-<5,^||7zF[</<](2-^-.)lh2^a,2.a](^M)"'(^)' 



/2|| 

2 



I V J-'2^a 

<c(a)2-^-/2^||^-i[^^^[^](2-..)]||^ 
I 

= c(a)J^ ||F-i[7,F[0,o]]||2<c(a)||<A,o||o,2,i. 



To bound the last quantity, we use the inequality || • ||o,2,i ^ || • ||o,2,i (which 
follows from Definition 1 and results in [32], Section 6.10). By orthogonality 
of the wavelet basis (j G N, without loss of generality). 



0j0||0,2,l 



The first term on the right-hand side is bounded by ||Ko(0jo)||2 < ll^jolb < 1 
since Kq is an L^-projection. For the second term, we note, writing for 
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ip{- — k) and using the change of variables 2^x = u and Condition 2, that 



k k 



< 2' 2^^' sup 
k 

<C2(^,0)2'-^- 



so that 



1=0 



1=0 



This shows that 2~^/'^6j\\(j)jo * 'i]j\\i/2,2,i is bounded by a fixed constant that 
depends only on which completes the proof of the entropy bound. The 
proof of Lemma 2 is the same (in fact, it is simpler since, in the last step, 
by orthogonality, only the resolution level / has to be considered). □ 

3.3. Proofs of Propositions 1 and 2. 

Proof of Proposition 1. We recaU (2.7) and observe that Tij is 
bounded by the fixed constant U. We prove j > 0; the case j = is the 
same, except for notation. Using the moment inequality (57) in [19] and 
Lemma 1, we obtain 



ii^sup 

kez 



2J 



m=l 



2J 
6jn 



-E 



Y,{h{Yn,)-Eh{Y)) 



m=l 



< 



< 



C{v)2^ 
6jn 



, ^ AU ^ AU\ 
(T\jn log + log I 



C{v,A,U) 



n n 



where cr^ > sup^^g-^, EhP'iY) is obtained as follows: using Plancherel's theo- 



rem, 



Eh\Y) = 6^ / 4>%ix)gix)dx<5^\\g\ 



oo||<Pjfc||2 
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j22-2i||„IL„ / 

27r 



u)\ du 



< ^2-2^||<7||oo /' " \F[^ok]i2-^u)\^du 



2-K 



<l-2-'J\\g\\^j \F[(t>Qk]{v)? dv 

= 2-i\\g\\^<2-^G^ = a\ 

a bound which does not depend on h. The claim for general p follows from 
standard arguments for uniformly bounded empirical processes, using, for 
instance. Proposition 3.1 in [18]. 

We now prove the second statement. For every u' > 0, Talagrand's inequal- 
ity in Bousquet's version [6] applied to Z = X^m=i(^(^m) ~ ^^0^))\\'Hj 
yields 



Now, the first statement of the proposition and taking u' = {1 + u)j' imply, 
after some elementary computations, that 



<e-(i+n)/, 
which completes the proof. □ 

Proof of Proposition 2. The proof is the same as that of Proposition 
1 (up to some obvious modifications). □ 

3.4. Proofs of Theorems 2 and 3. First, consider Theorem 2. The bias is 
^nv\f{x) - Efn{x,3n)\ = \\f - K,^U)\\oo<Ci2~^-' <C[(^-\ ' , 

where > depends only on ||y^||s,oo,oo (see Theorem 9.4 in [23]). For the 
"variance" term, Proposition 1 and our choice for j„ give 

Ssup|/„(X, j„) -Efn{x,jn)\ 



< L""e 



n /7~i M/alogali'logn) ^ /^log2(i^logn)\ 

G\/ (z^logn)va l-(i^logn) ' 

V an an J 
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< L""'G 



n 



n 



(log n)^/" log log 



n = o 



logn 



s/a 



Using Proposition 1 and the above bias- variance decomposition, the proof 
of Theorem 3 is similar to that of Theorem 2 and is left to the reader. 



3.5. Proof of Theorem 4- For simplicity of notation, we suppress the 
suprema over B{s, L) in most of what follows — uniformity of the bound fol- 
lows from tracking all of the constants involved and noting that any density 
in B{s,L) is bounded by a fixed constant U that depends only on s,L. We 
have 

sup i?||/J-/||oo< sup Esup\fn{y,0)-Efn{y,0)\ 

feB{s,L) f&B{s,L) yeM 



+ sup E 

f&B{s,L) 



il-1 



1=0 k 



+ sup \\Kj,{f)-f\U 

f&B(s,L) 

The first term in the right-hand side is treated in Proposition 1, which 
implies that sup f ^b(s,l) Esupy^^\fn{y,0) - Efn{y,0)\ < Cy/TJn, which is 
of smaller order than the right-hand side in (2.13). For the third, "deter- 
ministic," term, we have, from standard approximation results for wavelets 
(Theorem 9.4 in [23]), (/) - /||oo < c(L)2-Ji^ < c'(L)((logn)/n)"/(2^+i), 
which is again of smaller order than the right-hand side in (2.13). 

The quantity inside the expectation of the supremum of the second term 
can be decomposed, for any / G B{s,L), as 

ii-i 

1=0 k 



l/3ife|<T,|/3ifc|>2r + ^|/3ifc|<T,|/3ifc|<2r 



1=0 k 



and we denote these terms (I)-(IV). 

We first treat the "large deviation" terms (II) and (III). For (II), using 
(2.3) and the Cauchy-Schwarz inequality, we have 



-Esup 



ii-i 



1=0 k 



(3.5) 



< E 



^sup|/3ifc-/3zfc|supl|^^^ 



/2 



>T,\lilk\<T/2 



1=0 



sup V|V';fc(y)| 
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1/2 



< ^2'/2c(V')psup|Afc-Afc| 

1=0 ^ 



i?sup 1 



l/3ifcl>r,|Afc|<T/2 



1/2 



We have, using the second part of Proposition 2, choosing k! large enough 
depending only on a, w, C, 4>, and using the fact that (2'//n)^/^ is bounded 
by a fixed constant independent of /, 

^^^Pl|A.|>r,|A,|<r/2 



■Afc|>T/2 



(3.6) 



<Pr(sup|A.-A.|>^C2-«-/^) 



< Pr 



:^sup|Afc - /3ik\ > c{a,w,C)K'G ^ 



6i\l n 



Now, combining (3.5) and (3.6) with the first part of Proposition 2 yields 
the bound 

ii-i 



2'(«'+(V2))(^^i^g-logn ^ (^/(g^- log n ^/ ^Og ^ on (io+1/2) 

n 

for (II). 

For term (III), using (3.6), as well as J2k lAfcl — c(V')2'^^ for any density 
/, we have 



ii^sup 



1=0 k 



!fel<r,|ftfe|>2T 



< ^ 2'/2||V.||oo J] m PrdAfcl < T, lAfcl > 2r) 



i=0 



< C""e-2l°g" ^ 2' < C""n"2(n/logn)V(2«,+i) ^ o(n"i/2). 
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We now bound (I). Let ji(s) be such that < ji(s) < Ji — 1 and 
(3.7) ~ (n/logn)i/(2«+2«,+i) 

[such ji(s) exists by the definitions]. Proposition 2 and (2.3) give 

ji{s)-l 
/=0 k 

< ^ Esup|Afc-AA:|2'/2c(V) 



z=o 



2^' 




n 



n 



s/{2s+2w+l) 



where D" > depends only on i{j,(j),C,w. For the second part of (I), using 
the fact that Definition 1 implies 



(3.8) 



sup\(3ik{f)\ < DiL)2- 

k 



-«(s+l/2) 



for / G B{s,L), the definition of r and Proposition 2, we obtain 



E'sup 



ii-i 

Y Y.^hk-Pik)i^ik{y)i 

l=ji{s) k 



/3ifc|>r,|Afc|>r/2 



< Y Esup\Pik-Pik\-2' 



Iw 



ii-i 



n 



< D'" Y 2"'" < D'' 



logn 



n 



sup|Afc|2'/'c(V) 

logn k 



s/(2s+2w+l) 



where D"" depends only on L, s, k', 0, V', C*. 

To complete the proof, we control the term (IV). Again using (3.8), we 
have 

ii-i 



(3.9) 



1=0 k 

<C(V) J]sup2'/2|/3;fc|l|^^^|<2, 

z=o ^ 
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/=0 ^ ^ 



Since the antagonistic terms in the minimum are strictly monotone in I, the 
/* G M for which they are maximal is the one where they are equal so that 
2'* ~ [cf. (3.7)]. If we denote by [/*] the integer part of I*, then the last 
sum is bounded by 



1=0 l=[l*]+l \ J 



3.6. Proofs for Section 2.4- The following proposition is the wavelet- 
analog of a similar result in Proposition 1 in [7] for kernel regularizations. 

Proposition 4. Let </>, satisfy Condition 2. Let f G Aco,s{^) f^'^ some 
Co, s, L > 0. We then have, for every j > 0, that 

\\K,{f) - /lU < c'"VL2^^'-^y'e-'-(-'r2^\ 
where the constant d" > depends only on (/),^,co,s. 

Proof. Using (2.3), Plancherel's theorem and the fact that / G AcQ^siL), 
we have 



\K,if)-f\\^<c{^P)^2'/^snp\|3M\ 
i>j '^ez 

= c'V2'/2sup / F[i,ik]{u)Ff{u)du 
<c'^sup / \Fm2-'u)\\Ff{u)\du 



i>j ^-e^ 



<c'||^||iV / |F/(u)|e^'''"l'e-'^°'"l'dn 



< c"U\\iVlY^ J e-25o«= du 



and the result follows from the inequality e"'^"^ du < C(c, s)a^~*e~'^"^ for 
a,s>0. □ 

Proof of Corollary 1. Decomposing the sup-norm error of the lin- 
ear estimator into "bias" and "variance" terms and applying Propositions 1 
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and 4, we have, for any j > 0, 
Esup\fn{x,j) - f{x)\ 

<SUp\Efnix,j) - f{x)\+ Esnp\fnix,j) - Efn{x,j)\ 

< c'" VZe-^oK)»2-2.(i-s)/2 ^ 1 ^ fW ^ 

Oj \ \ n n J 

where C" > depends only on C, s, L, cq, a, w. The result follows immediately 
for s > 1 and for s < 1 in view of the fact that (1 — s)/2s < (w + 1/2) /s for 
all s > 0. □ 

Proof of Theorem 5. The proof of this theorem follows the one of 
Theorem 1 up to the following modifications. Let p be the standard Cauchy 
density. Fix < z/ < 1/2. Since F\p](u) = e"'"', we see from the scaling 
property of Fourier transforms and since s < 1 that there exists a constant 
■q = ri{u) large enough such that /o = {l/i])p{'/'n) £ •^co,s{i^'^L). 

As in the proof of Theorem 1, we consider the functions fkix) = foix) + 
Iji^jkM^ l<k<2^ - 1, kM = kM, M > 1 with = c'^^/j2^'^e-^''^''"+^^'^'\ 
We have fk G AcQ,siL) for every A: if c' > is a constant taken small enough 
and depending only on i/, a, ||^/'||i since 

/ |F[A](t)|2e2£oirdt 
JR 

<2 / |F[/o](t)pe2^»l*l^dt + 27| / \F[i^,,km\'e^'"^'^' dt 

JR JR 

<A-Ku'^L + 2^p~^U\\l / e^^°\'\^ dt 

Ja'2i 

< Attu'L + 2(c')'Li22^''"e-2^«['^^+i]2- \\^\\2a2^ ^^coa'^^^ 

< 2-kL. 

Take 2^^ = 2co[a'>+i] ^og"-- The proof of Theorem 1 then implies, V/c 7^ k', 

ll/fc — /fc'lloo > C3^y (log logn)/n(logn)('"+-^/^)/** for some constant C3 > in- 
dependent of n. Next, for any k, the Kullback-Leibler divergence between 
PJ} and Pq satisfies 

K{PJ}\P^) < Cin-fp-^^"" = C4(c')^i.nj22j"'e-2^«['^°+^l2^''2-2j"' < c^{c')^Lj. 
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This and Lemma 3 together yield the result for c' > chosen small enough 
independently of n, fc. □ 



Proof of Proposition 3. We use Proposition 5 below. Note that 



(3.10) \\f^{j)-Ef„ 



- sup 



1 

-y^{K*{x,Y^)-EK*{x,Y)) 



m=l 



The class {K*(x, •) : x G M} has envelope C/(j) = 2^ c{4))^J~al2^ in view of 
(2.3) and (3.4). Since Proposition 5 deals with classes of functions bounded 
by 1/2, we have to rescale, that is, we consider the class Q := Qj = {K*{x, •)/ 
2U{j):x G M}, which is uniformly bounded by 1/2. Furthermore, the up- 
per bound for the weak variances supg^g Eg'^{Y) < cj^ can be taken to be 
2~-'(7r/2)||(/>||^||(7||oo in view of the estimate 

E{KUx,Y)f < 2^||5||ooc(<A)'||</'jo||?||r/,||i 



< 



1511 



l6f2^{aM 



which uses Young's inequality (and the definition of rjj from the proof of 
Lemma 1). 

To prove the inequality, set d{(j)) = c((/>)-\/a/27r^ and (i'((/)) = (i((/))||0||i\/27r 
so that 

Pr|||/„(j,.)-ii;/„(i,.)||oo 



> 6i?„(j) + j 2^\\g\\^{z + log2) ^ 44 2^d{(l)){z + log 2) 



:Pr< 



1 " iK*{;Y„ 
n ^ 

m=l 



n 



6, 



n 



EKU;Y)) 



2C/(j) 



(z + log2) , ^^z + log2 

l\l ^TT-i H 



2J+1 



n 



n 



but this quantity equals the probability in Proposition 5 below for T = Q. 

For the second claim of the proposition, we only have to show that ERn{j) 
has, up to constants, the required order as a function of j,n. But this fol- 
lows readily from the usual desymmetrization inequality for Rademacher 
processes (cf., e.g., expression (23) in [21]), as well as from Proposition 1. 
□ 



Proof of Corollary 2. The result follows from standard arguments 
(combining Propositions 3 and 4). □ 
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3.7. A concentration inequality using Rademacher processes. We start 
with the following inequality, which is a Bernstein-type version of simi- 
lar inequalities in [29] and complements the results in [21]. Let = 
supjgjr \H{f)\ for any set F and functions i7 : J^— M. 

Proposition 5. Let Xi,...,X„, he i.i.d. with law P on a measurable 
space {S, A) . Let F be a countable class of real-valued measurable functions 
defined on S, uniformly bounded by 1/2, and let cr^ > supjgjrii^/^(X). We 
have, for every n G N and x > 0, that is greater than or equal to 



Pr. 



\jZU{x.)-Pf) 

=1 

1 " 

-Y.^^f{X■, 



i=l 



i=l 



T 

+ 10 



(3; + log2)q-2 ^ ^^ a; + log2 



n 



n 



Proof. We first recall the lower-deviation version of Talagrand's in- 
equality, as given in [27], and a simple consequence of it. Using the notation 
Z = II — -P/)||j^, we have, using the inequalities ^fa^Tb < -y/a + 

and \/a6 < (o + &)/2, that 



e"^' > Pr{Z <EZ- ^/2x{na^ + 2EZ) - x} 
> Fr{Z < 0.5EZ - \/2xna^ - 3x} 



Pr 



f 1 " 

I i=l 

n 

-Y.^f{x.)-pf) 



< 0.5E 



1=1 



2x0"^ 3a; 
n n 



and one likewise proves, using the upper-deviation version of Talagrand's 
inequality [6], 



(3.11) 



f 1 " 

I i=i 



> l.bE 



i=l 



2x0-2 7x 
n in 



To prove the proposition, observe that 
Prj >6 -Y,e,f[X, 



+ 10 



xcr2 22x 1 
n j 
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< 



T 



> 3E 



n ^-^ jr V n n J 



J" 

22x ' 



< -8.5a/ 0.85 

n n 



<Pr{ 



> 1.5E 



n 



{f{X,)-Pf) 



+ 



2x0-2 



n 



^ 7x 
3n 



+ Pr| <0.5^ ly^SifiX, 





/2xcr2 




T 


V n 


n J 



where we have used the standard Rademacher symmetrization inequahty 
(e.g., (23) in [21]). The first quantity on the right-hand side of the last 
inequahty is less than or equal to e~^, by (3.11). For the second term, note 
that the first displayed inequality in this proof also applies to the randomized 
sums Er=i^i/(^0> by taking g = {g{T,x) = Tf{x):f G J"}, r G {-1,1}, 
instead of T and the probability measure P = 2~^{6-i + 6i) x P instead of 
P. It is easy to see that a can be taken to be the same as for J^. This gives 
the overall bound 2e~^ and a change of variables in x gives the final bound. 
□ 
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